Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making

نویسنده

  • Andrew Critch
چکیده

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests over time. Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player’s own beliefs in evaluating how well an action will serve that player’s utility function, and (2) shift the relative priority it assigns to each player’s expected utilities over time, by a factor proportional to how well that player’s beliefs predict the machine’s inputs. Observation (2) represents a substantial divergence from näıve linear utility aggregation (as in Harsanyi’s utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making

It is often argued that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a {\em Pareto-optimal} policy, i.e., a policy that cannot be improved upon for one agent without making sacrifices for another. A famous theorem of Harsanyi shows that, when the principals have a common prior on the outcome distributions of all policies, a Pare...

متن کامل

Self-Optimizing and Pareto-Optimal Policies in General Environments Based on Bayes-Mixtures

The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle t action yt results in perception xt and reward rt, where all quantities in general may depend on the complete history. The perception xt and reward rt are sampled from the (reactive) environmental probability distribution . This very general setting includes, but is not limited to, (partial ob...

متن کامل

Sequential Decision Making for Profit Maximization Under the Defection Probability Constraint in Direct Marketing

Direct marketing is one of the most effective marketing methods with an aim to maximize the customer's lifetime value. Many cost-sensitive learning methods which identify valuable customers to maximize expected profit have been proposed. However, current cost-sensitive methods for profit maximization do not identify how to control the defection probability while maximizing total profits over th...

متن کامل

Are People Successful at Learning Sequential Decisions on a Perceptual Matching Task?

Sequential decision-making tasks are commonplace in our everyday lives. We report the results of an experiment in which human subjects were trained to perform a perceptual matching task, an instance of a sequential decision-making task. We use two benchmarks to evaluate the quality of subjects’ learning. One benchmark is based on optimal performance as defined by a dynamic programming procedure...

متن کامل

Non-Deterministic Policies In Markovian Processes

Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct adaptive treatment strategies, where a sequence of individualized treatments is learned from clinic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1701.01302  شماره 

صفحات  -

تاریخ انتشار 2017